A Framework for Authorial Clustering of Shorter Texts in Latent Semantic Spaces
نویسندگان
چکیده
Authorial clustering involves the grouping of documents written by same author or team authors without any prior positive examples an author’s writing style thematic preferences. For authorial on shorter texts (paragraph-length that are typically than conventional documents), document representation is particularly important. We propose a high-level framework which utilizes compact data in latent feature space derived with non-parametric topic modeling. clusters identified thereafter two scenarios: (a) fully unsupervised and (b) semi-supervised where small number known to belong (must-link constraints) not (cannot-link constraints).We report experiments 120 collections three languages genres show topic-based provides promising level performance while reducing dimensionality factor 1500 compared state-of-the-art. also demonstrate little knowledge constraints memberships leads auspicious improvements front this difficult task.
منابع مشابه
a framework for identifying and prioritizing factors affecting customers’ online shopping behavior in iran
the purpose of this study is identifying effective factors which make customers shop online in iran and investigating the importance of discovered factors in online customers’ decision. in the identifying phase, to discover the factors affecting online shopping behavior of customers in iran, the derived reference model summarizing antecedents of online shopping proposed by change et al. was us...
15 صفحه اولLatent Semantic Space for Web Clustering
To organize a huge amount of Web pages into topics, according to their relevance, is the efficient and effective method for information retrieval. Latent Semantic Space (LSS) naturally in the form on some geometric structure in Combinatorial Topology has been proposed for unstructured document clustering. Given a set of Web pages, the set of associations among frequently co-occurring terms in t...
متن کاملthe role of semantic and communicative translation on reading comprehension of scientific texts
the following null hypothesis was proposed: h : there is no significant difference between the use of semantically or communicatively translates scientific texts. to test the null hypothesis, a number of procedures were taken first, two passages were selected form soyrcebooks of food and nutrition industry and gardening deciplines. each, in turn, was following by a number of comprehension quest...
15 صفحه اولFuzzy clustering of semantic spaces
In this paper the GK` model for the construction of thesaurus classes based on fuzzy semantic association measure between index terms and concepts (thesaurus classes) is presented. The association measure is obtained on the basis of fuzzy semantic relations between
متن کاملDouble Clustering in Latent Semantic Indexing
Document clustering is a widely researched area of information retrieval. The large amount of documents which must be handled needs automatic organizing. A popular approach to clustering documents and messages is the vector space model, which represents texts with feature vectors, usually generated from the set of terms contained in the message. The clustering based on the document-term frequen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-74251-5_24